Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation

نویسندگان

Mu Li

Yinggong Zhao

Dongdong Zhang

Ming Zhou

چکیده

This paper addresses the problem of dynamic model parameter selection for loglinear model based statistical machine translation (SMT) systems. In this work, we propose a principled method for this task by transforming it to a test data dependent development set selection problem. We present two algorithms for automatic development set construction, and evaluated our method on several NIST data sets for the Chinese-English translation task. Experimental results show that our method can effectively adapt log-linear model parameters to different test data, and consistently achieves good translation performance compared with conventional methods that use a fixed model parameter setting across different data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Smorgasbord of Features for Statistical Machine Translation

We describe a methodology for rapid experimentation in statistical machine translation which we use to add a large number of features to a baseline system exploiting features from a wide range of levels of syntactic representation. Feature values were combined in a log-linear model to select the highest scoring candidate translation from an n-best list. Feature weights were optimized directly a...

متن کامل

A Discriminative Lexicon Model for Complex Morphology

This paper describes successful applications of discriminative lexicon models to the statistical machine translation (SMT) systems into morphologically complex languages. We extend the previous work on discriminatively trained lexicon models to include more contextual information in making lexical selection decisions by building a single global log-linear model of translation selection. In offl...

متن کامل

The NTT statistical machine translation system for IWSLT2005

This paper reports the NTT statistical translation system participating in the evaluation campaign of IWSLT 2005. The NTT system is based on a phrase translation model and utilizes a large number of features with a log-linear model. We studied the various features recently developed in this research field and evaluate the system using supplied data as well as publicly available Chinese, Japanes...

متن کامل

A Comparison of Mixture and Vector Space Techniques for Translation Model Adaptation

In this paper, we propose two extensions to the vector space model (VSM) adaptation technique (Chen et al., 2013b) for statistical machine translation (SMT), both of which result in significant improvements. We also systematically compare the VSM techniques to three mixture model adaptation techniques: linear mixture, log-linear mixture (Foster and Kuhn, 2007), and provenance features (Chiang e...

متن کامل

Statistical Machine Translation of Euparl Data by using Bilingual N-grams

This work discusses translation results for the four Euparl data sets which were made available for the shared task “Exploiting Parallel Texts for Statistical Machine Translation”. All results presented were generated by using a statistical machine translation system which implements a log-linear combination of feature functions along with a bilingual n-gram translation model.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation

نویسندگان

چکیده

منابع مشابه

A Smorgasbord of Features for Statistical Machine Translation

A Discriminative Lexicon Model for Complex Morphology

The NTT statistical machine translation system for IWSLT2005

A Comparison of Mixture and Vector Space Techniques for Translation Model Adaptation

Statistical Machine Translation of Euparl Data by using Bilingual N-grams

عنوان ژورنال:

اشتراک گذاری